Exploiting Load Latency Tolerance in Dynamically Scheduled Processors

نویسندگان

  • Srikanth T. Srinivasan
  • Alvin R. Lebeck
چکیده

This paper provides quantitative measurements of load latency tolerance in a dynamically scheduled processor and presents one cache management technique that exploits this information to improve overall performance. We determine the latency of each memory load operation such that the number of instructions issued per cycle (IPC) is comparable to an ideal memory system that satisfies all requests in a single cycle. Our measurements reveal that to produce IPC values within 16% of the ideal memory system, between 50% and 90% of loads need to be satisfied within a single cycle and that up to 50% can be satisfied in as many as 8 cycles (an artificially imposed upper limit), depending on the benchmark and processor configuration. Load latency tolerance is largely determined by the number of dependent operations and whether a branch is dependent on the load. This paper presents an all hardware approach to obtain this information and to utilize it in determining cache replacement decisions. Simulation results indicate this technique can improve IPC values by up to 8%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Comparative Survey of Load Speculation Architectures

Load latency remains a signi cant bottleneck in dynamically scheduled pipelined processors. Load speculation techniques have been proposed to reduce this latency. Dependence Prediction can be used to allow loads to be issued before all prior store addresses are known, and to predict exactly which store a load should wait upon. Address Prediction can be used to allow a load to bypass the calcula...

متن کامل

The Impact of Exploiting Instruction-Level Parallelism on Shared-Memory Multiprocessors

ÐCurrent microprocessors incorporate techniques to aggressively exploit instruction-level parallelism (ILP). This paper evaluates the impact of such processors on the performance of shared-memory multiprocessors, both without and with the latencyhiding optimization of software prefetching. Our results show that, while ILP techniques substantially reduce CPU time in multiprocessors, they are les...

متن کامل

Precise Instruction Scheduling

Pipeline depths in high performance dynamically scheduled microprocessors are increasing steadily. In addition, level 1 caches are shrinking to meet latency constraints but more levels of cache are being added to mitigate this performance impact. Moreover, the growing schedule-toexecute-window of deeply pipelined processors has required the use of speculative scheduling techniques. When these e...

متن کامل

Improving Latency Tolerance of Multithreading through Decoupling

ÐThe increasing hardware complexity of dynamically scheduled superscalar processors may compromise the scalability of this organization to make an efficient use of future increases in transistor budget. SMT processors, designed over a superscalar core, are therefore directly concerned by this problem. This work presents and evaluates a novel processor microarchitecture which combines two paradi...

متن کامل

Aligned Scheduling: Cache-Efficient Instruction Scheduling for VLIW Processors

The performance of statically scheduled VLIW processors is highly sensitive to the instruction scheduling performed by the compiler. In this work we identify a major deficiency in existing instruction scheduling for VLIW processors. Unlike most dynamically scheduled processors, a VLIW processor with no load-use hardware interlocks will completely stall upon a cache-miss of any of the operations...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998